Method and device for vertebra localization and identification

ABSTRACT

A vertebra localization and identification method includes: receiving one or more images of vertebrae of a spine; applying a machine learning model on the one or more images to generate three-dimensional (3-D) vertebra activation maps of detected vertebra centers; performing a spine rectification process on the 3-D vertebra activation maps to convert each 3-D vertebra activation map into a corresponding one-dimensional (1-D) vertebra activation signal; performing an anatomically-constrained optimization process on each 1-D vertebra activation signal to localize and identify each vertebra center in the one or more images; and outputting the one or more images, wherein on each of the one or more outputted images, a location and an identification of each vertebra center are specified.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the priority of U.S. Provisional PatentApplication No. 63/120,693, filed on Dec. 2, 2020, the entire content ofwhich is incorporated herein by reference.

FIELD OF THE TECHNOLOGY

This application relates to the field of medical image interpretationand, more particularly, to a method and device for vertebra localizationand identification.

BACKGROUND OF THE DISCLOSURE

Accurate localization and identification of spine vertebrae inthree-dimensional (3-D) medical images are important for computer-aideddiagnosis of spine disorders. Due to similar appearances of the spinevertebrae, it is difficult to identify the vertebrae with asubstantially high accuracy. Various existing approaches take advantagesof machine learning algorithms to improve the accuracy. However, thoseapproaches are ineffective in incorporating anatomical constraints inthe computer-aided diagnosis.

Thus, there is a need to provide method, device, and storage medium forvertebra localization and identification, to effectively utilize theanatomical knowledge to achieve best accuracy and robustness.

SUMMARY

One aspect of the present disclosure includes a vertebra localizationand identification method. The method includes: receiving one or moreimages of vertebrae of a spine; applying a machine learning model on theone or more images to generate three-dimensional (3-D) vertebraactivation maps {G_(v)} of detected vertebra centers; performing a spinerectification process on the 3-D vertebra activation maps {G_(v)} toconvert each 3-D vertebra activation map G_(v) into a correspondingone-dimensional (1-D) vertebra activation signal Q_(v); performing ananatomically-constrained optimization process on each 1-D vertebraactivation signal Q_(v) to localize and identify each vertebra center inthe one or more images; and outputting the one or more images, whereinon each of the one or more outputted images, a location k and anidentification v_(l) of each vertebra center are specified, whereinv_(l)∈[1, 26].

Another aspect of the present disclosure includes a vertebralocalization and identification device. The device includes a memorystoring a computer program and a processor configured to execute thecomputer program stored in the memory to: receive one or more images ofvertebrae of a spine; apply a machine learning model on the one or moreimages to generate three-dimensional (3-D) vertebra activation maps{G_(v)} of detected vertebra centers; perform a spine rectificationprocess on the 3-D vertebra activation maps {G_(v)} to convert each 3-Dvertebra activation map G_(v) into a corresponding one-dimensional (1-D)vertebra activation signal Q_(v); perform an anatomically-constrainedoptimization process on each 1-D vertebra activation signal Q_(v) tolocalize and identify each vertebra center in the one or more images;and output the one or more images, wherein on each of the one or moreoutputted images, a location k and an identification v_(l) of eachvertebra center are specified, wherein v_(l)∈[1, 26].

Another aspect of the present disclosure includes a non-transitorycomputer-readable storage medium storing a computer program. When beingexecuted by a processor, the computer program performs: receiving one ormore images of vertebrae of a spine; applying a machine learning modelon the one or more images to generate three-dimensional (3-D) vertebraactivation maps {G_(v)} of detected vertebra centers; performing a spinerectification process on the 3-D vertebra activation maps {G_(v)} toconvert each 3-D vertebra activation map G_(v) into a correspondingone-dimensional (1-D) vertebra activation signal Q_(v); performing ananatomically-constrained optimization process on each 1-D vertebraactivation signal Q_(v) to localize and identify each vertebra center inthe one or more images; and outputting the one or more images, whereinon each of the one or more outputted images, a location k and anidentification v_(l) of each vertebra center are specified, whereinv_(l)∈[1, 26].

Other aspects of the present disclosure can be understood by thoseskilled in the art in light of the description, the claims, and thedrawings of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a flowchart of an exemplary method for vertebralocalization and identification consistent with embodiments of thepresent disclosure;

FIG. 2 illustrates exemplary spine images consistent with embodiments ofthe present disclosure;

FIG. 3 illustrates an exemplary method for vertebra localization andidentification consistent with embodiments of the present disclosure;

FIG. 4 illustrates an exemplary pseudo code of an optimization processconsistent with embodiments of the present disclosure;

FIG. 5 illustrates comparison of vertebra localization andidentification results of five datasets processed by four existingmethods and the method consistent with embodiments of the presentdisclosure;

FIG. 6 illustrates comparison of identification rates for each of 26vertebrae of various alternative methods consistent with embodiments ofthe present disclosure;

FIG. 7 illustrates comparison of vertebra localization andidentification results of five datasets processed by four existingmethods and the method consistent with embodiments of the presentdisclosure;

FIG. 8 illustrates comparison of vertebra localization andidentification results of five datasets processed by four existingmethods and the method consistent with embodiments of the presentdisclosure;

FIG. 9 illustrates comparison of vertebra localization andidentification results of five datasets processed by four existingmethods and the method consistent with embodiments of the presentdisclosure;

FIG. 10 illustrates incorrectly localized and incorrectly identifiedvertebrae of three datasets processed by four existing methods and themethod consistent with embodiments of the present disclosure; and

FIG. 11 illustrates a structural diagram of an exemplary device forvertebra localization and identification consistent with embodiments ofthe present disclosure.

DETAILED DESCRIPTION

The following describes the technical solutions in the embodiments ofthe present invention with reference to the accompanying drawings.Wherever possible, the same reference numbers will be used throughoutthe drawings to refer to the same or like parts. Apparently, thedescribed embodiments are merely some but not all the embodiments of thepresent invention. Other embodiments obtained by a person skilled in theart based on the embodiments of the present invention without creativeefforts shall fall within the protection scope of the presentdisclosure. Certain terms used in this disclosure are first explained inthe followings.

Bidirectional recurrent neural networks (Bi-RNN): Bi-RNN connect twohidden layers opposite directions to the same output. With this form ofgenerative deep learning, the output layer can get information from past(backward) and future (forward) states simultaneously.

Spine images: spine images are usually Computed tomography (CT) images.A CT scan is a medical imaging technique that uses computer-processedcombinations of multiple X-ray measurements taken from different anglesto produce tomographic (cross-sectional) images of a body or a sectionof the body, allowing a user to see inside the body without cutting.

Convolutional neural network (CNN): a CNN includes an input layer thatperforms the activation function, hidden layers that performconvolutions, and an output layer that performs the final convolution.The hidden layers do multiplication or other dot product. The activationfunction is commonly ReLU. CNN is a class of deep neural networks, mostcommonly applied to analyzing visual imagery.

Fully convolutional network (FCN): FCN is an architecture employingsolely locally connected layers, such as convolution, pooling, andup-sampling. Avoiding the use of dense layers means less parameters andmakes the networks faster to train.

Stochastic gradient descent (SGD): SGD is an iterative method foroptimizing an objective function with suitable smoothness properties.

U-Net: it is a convolutional neural network for biomedical imagesegmentation. The network is based on the fully convolutional networkand its architecture was modified and extended to work with fewertraining images and to yield more precise segmentations.

Vertebrae: 26 vertebrae in the human vertebral column are divided intodifferent regions corresponding to the curves of the spinal column.Vertebrae in these regions are essentially alike, with minor variation.These regions are called the cervical spine, thoracic spine, lumbarspine, sacrum, and coccyx. There are seven cervical vertebrae (C1-C7),twelve thoracic vertebrae (T1-T12), and five lumbar vertebrae (L1-L5).

To distinguish spine vertebrae with similar shapes and appearances,various existing approaches attempted to address this issue byexploiting anatomical prior knowledge including a spatial order of thespine vertebrae and a distance between neighboring spine vertebrae. Inone example, spine anatomical knowledge is incorporated into neuralnetworks implicitly using Bi-RNN or explicitly using an informationaggregation layer considering the spatial distribution prior knowledgeof the spine vertebrae. In another example, the anatomical priorknowledge has been used to post-process neural network outputs. In theseexamples, the anatomical prior knowledge is not fully utilized. Inparticular, anatomy-inspired network architectures like Bi-RNN rely onthe network to learn the anatomical prior knowledge without theguaranteed respect to the prior knowledge. Building the anatomical priorknowledge into a network layer or an optimization target makes acompromise that turns a hard constraint (e.g., the spatial order, whichshould be strictly enforced) into a soft constraint that can beviolated. As a result, the existing approaches sometimes producephysically implausible predictions (e.g., vertebrae in reversed order,multiple occurrences of a same vertebra.)

When the existing approaches employ information exchange mechanisms(e.g., Bi-RNN and message passing) to bring a global context intovertebra label classification scores, each vertebra label is stillclassified individually at the output stage without imposing anatomicalconstraints. Thus, the existing approaches completely depend on theinformation exchange mechanisms to capture and regulate the spatialrelationships between different spine vertebrae. Existing fusionmechanisms include Bi-RNN and aggregation of neighboring vertebraeactivation maps following the vertebrae distance prior knowledge. TheBi-RNN encourages the massage passing between different spine vertebraein a softly learned way instead of enforcing it in an anatomy coherentmanner. The aggregation of the neighboring vertebrae activation maps isonly reliable for short-range relationships, leaving the globalanatomical prior knowledge insufficiently exploited.

In another example, a specific optimization formulation is used tojointly label the spine vertebrae by formulating a global objectivefunction. However, Markov modeling of vertebra labels is limited tocapture the short-range relationships and errors accumulate with eachMarkov step.

The present disclosure provides a vertebra localization andidentification method that jointly labels all vertebrae with theanatomical constraints to effectively utilize the anatomical priorknowledge and achieve best accuracy and robustness. Specifically, a keypoint localization U-Net model is trained to predict three-dimensional(3-D) activation maps for 26 vertebra centers. Along an automaticallycalculated spine centerline, the 3-D activation maps are warped torectify the spine and aggregated to form one-dimensional (1-D) vertebraactivation signals. An optimization process is then performed on the 1-Dvertebra activation signals by incorporating a hard constraint of thespatial order of the spine vertebrae to limit an optimization searchspace (also referred to as a constraint search space) and a softconstraint of the prior knowledge of distances between the spinevertebrae (i.e., a regularization term in an objective function.) Labelsof all spine vertebrae are searched jointly in the constraint searchspace, which allows global message passing among the spine vertebrae andensures the anatomical plausibility of results.

The vertebra localization and identification method consistent with thepresent disclosure is evaluated based on publicly available benchmarks(e.g., SpineWeb). Training datasets for the method include 242 CT imagesand testing datasets for the method include 60 CT images. FIG. 2illustrates exemplary spine CT images. As shown in FIG. 2, some CTimages have a small field of view (a), a low image quality (b), andmetal implants and severe compression fractures (c, d). The disclosedmethod achieves an identification rate of 97.4%, substantiallyoutperforming the best existing method that achieves the identificationrate of 94.7%.

The vertebra localization and identification method consistent with thepresent disclosure aggregates 3-D vertebra activation maps into 1-Dvertebra activation signals to substantially reduce optimizationcomplexity, exploits the spatial order of the spine vertebrae as thehard constraint of the optimization search space to anatomically ensureplausible results, and includes the vertebra distance prior knowledge asthe soft constraint in the objective function, thereby improving theidentification rate from 94.7% to 97.4% and cutting an error rate byhalf.

Before the era of deep learning, the vertebra localization andidentification relied on hand-crafted low-level image features and/orpriori knowledge. Regression forests and probabilistic graphical modelsare often used to handle the CT images of arbitrary field-of-view.Sparse centroid annotations are further transformed into denseprobabilistic labels for classifier training. A hierarchical strategy isused to learn detectors dedicated to distinctive vertebrae andnon-distinctive vertebrae. However, the above approaches produceerroneous results on challenging pathological images due to the limitedmodeling power of hand-crafted features. In addition, these approachesfail to exploit global contextual information to facilitate vertebraidentification.

Employing deep neural networks to detect spine vertebrae has achievedsubstantially improved performance. In one example, the deep neuralnetworks such as convolutional neural network (CNN) and fullyconvolution network (FCN) have been employed to directly detect thevertebra centers and achieve the vertebra localization andidentification jointly in one stage. In another example, multi-stageapproaches have been employed to locate and identify the spinevertebrae, which can be categorized into top-down or bottom-upstrategies. The top-down strategy locates the whole spine first anddetects individual vertebrae next while the bottom-up strategy firstdetects the landmarks of all vertebrae and then classifies them intorespective vertebrae.

The prior knowledge of spine anatomy has been used to facilitate thevertebra localization and identification. Specifically, domain expertknowledge is used to categorize vertebrae into anchor and bundle setswhich are treated differently in the process. Markov modeling is adoptedto label the spine vertebrae by preserving a consecutive order. In somecases, the anatomical knowledge is learned automatically in adata-driven manner. For example, Bi-RNN is used to enable a machinelearning model to capture the spatial relationships of predictions indifferent spine regions, and a message passing mechanism is used toexploit prior distribution of the distances between the spine vertebraeto regulate the prediction. Adversarial learning has also been employedto encode and impose the anatomical prior knowledge. The multi-stageapproaches may embed the knowledge of the spine anatomy in the top-downand bottom-up representations.

FIG. 1 illustrates a flowchart of an exemplary method for vertebralocalization and identification consistent with embodiments of thepresent disclosure. As shown in FIG. 1, one or more images of vertebraeof a spine is received (S110). Specifically, the one or more images ofsize W×H×L are denoted as I, where W, H, and L are a size of the CTimages, I∈

^(W×H×L), and

is a mathematical expression of real numbers. The goal of the vertebralocalization and identification is to detect centers of vertebrae thatare present in I and to identify their labels. Each of the detectedvertebrae is indexed by V, where V⊆{1, 2, . . . , 26}. The centers ofthe detected vertebrae are labeled as C1-C7 in a cervical vertebraregion, T1-T12 in a thoracic vertebra region, L1-L5 in a lumbar vertebraregion, and S1-S2 in a sacrum vertebra region.

At S120, a machine learning model is applied on the one or more imagesto generate 3-D vertebra activation maps of detected vertebra centers.Specifically, the machine learning model takes the one or more images Ias input and outputs vertebra center annotations P={x_(v), y_(v),z_(v)}, where P is 3D coordinates of the vertebra centers in theoriginal CT images, and v∈V. In one embodiment, the machine learningmodel is a U-Net key point detection model. The U-Net key pointdetection model is trained by a training dataset of CT images, in whichthe vertebra center annotations P are provided. In another embodiment,another suitable machine learning model may be used to generate the 3-Dvertebra activation maps of the detected vertebra centers.

In one embodiment, the U-Net key point detection model is trained toproduce the 3-D activation maps of the detected vertebra centers. TheU-Net key point detection model is trained using a widely adoptedmulti-channel activation map regression approach. The multi-channelground-truth activation maps are generated using Gaussian distributioncentered on the spatial coordinates of the vertebra centers. The U-Netkey point detection model is trained using L2 loss on the predicted andground-truth activation maps. The produced 3-D vertebra activation mapsare denoted as {G_(v)}, where G_(v)∈

^(W×H×L), and v∈V. Although each activation map channel is trained toactivate around the center of the corresponding vertebra, due to therepetitive visual patterns of vertebrae, it is not uncommon for the heapmap to falsely activate on a wrong vertebra or activate on multiplevertebrae, as shown in FIG. 3(a).

In a standard key point detection (or localization) method, eachactivation map channel predicted by the machine learning model isindividually processed by taking the pixel with maximum activation ortaking a centroid to obtain the key point detection results. In theembodiments of the present disclosure, instead of directly processingthe activation map channels to obtain the vertebra centers, anatomicalknowledge is incorporated in the processing to achieve robust andaccurate vertebra localization and identification.

Referring to FIG. 1, at S130, a spine rectification process is performedon the 3-D vertebra activation maps to convert each 3-D vertebraactivation map into a corresponding one-dimensional (1-D) vertebraactivation signal. Specifically, after the 3-D vertebra activation mapsare generated, a spine centerline is extracted and each 3-D vertebraactivation map is aggregated along the centerline to produce the 1-Dvertebra activation signal.

In one embodiment, the activation maps for 26 individual vertebrae arecombined into a combined activation map Ĝ below:

{circumflex over (G)}=Σ_(v=1) ²⁶ G _(v).  (1)

The above equation represents the probability of any vertebra centerwithout differentiating their indices. While the individual activationmap often falsely activates on a wrong vertebra due to repetitive imagepattern, the activations are typically only around vertebra centers.Thus, by combining them into one, the centers of all vertebrae areactivated, as shown in FIG. 3(a).

In one embodiment, the spine centerline is computed from the combinedactivation map Ĝ. It is extracted by tracing mass centers of the axialslices of Ĝ, calculated as the average coordinates of pixels withactivation above 0.5. The spine centerline is denoted as s(t), wheres(t)=(x(t), y(t), z(t)), x(t), y(t), z(t) are coordinates of thecenterline s(t), and t is arc-length parameterization including adistance from a starting point of the centerline s(t). Given the spinecenterline s(t), the 3-D vertebra activation maps {G_(v)} are warped sothat the spine centerline s(t) becomes straight after warping.Specifically, a moving local coordinate system is calculated along thespine centerline and is denoted as

e₁(t), e₂(t), e₃(t)

, where e₃(t) is a tangent vector of the spine centerline s(t), e₂(t) isa unit vector in the normal plane of the spine centerline s(t) with aminimum angle to a y-axis of the image (i.e., a patient's frontdirection), and e₁(t) is a cross product of e₂(t) and e₃(t).

The axes e₁(t) and e₂(t) span the normal plane of the spine centerlines(t), where e₁(t) points at a patient's anterior direction and e₂(t)points at a patient's right direction. Given the spine centerline andthe local coordinate systems, spine rectified vertebra activation mapsG′_(v) and Ĝ′ are produced by warping G_(v) and Ĝ as follows:

G′ _(v)(x, y, z)=G _(v)(s(z)+e₁(x)+e₂(y)),  (2)

where G_(v)(s(z)+e₁(x)+e₂(y)) denotes linear interpolation of G_(v) atthe given coordinate. The warping operation can be seen as resamplingG_(v) in the normal plane of the spine centerline s(t). In the spinerectified vertebra activation maps, the spine centerline s(t) isstraight along z-axis, as shown in FIG. 3(c). The anterior and rightdirections of each vertebra are aligned with x-axis and y-axis.

The spine rectified activation maps G′_(v) and Ĝ′ are further processedto produce 1-D vertebra activation signals, denoted as Q_(v) and{circumflex over (Q)}′, respectively. Specifically, values in G′_(v) aresummed along the x-axis and the y-axis, written as

Q _(v)(z)=Σ_(x,y) G′ _(v)(x, y, z).  (3)

Since the vertebra activation map is aggregated in the normal plane ofthe spine centerline, the produced 1-D vertebra activation signalindicates the likelihood of the vertebra center at given location z onthe spine centerline. Aggregating the activations in the normal planestrengths the activation signal of the vertebra centers and reducing thespine localization search space to the 1-D substantially simplifiessearching complexity, thereby making it possible and affordable to adoptmore complex optimization approaches. Despite the strengthenedactivation, false activations in the original activation map are carriedover to the 1-D activation signal, resulting in false activations in the1-D activation signal, as shown in FIG. 3(c).

Referring to FIG. 1, at S140, an anatomically-constrained optimizationprocess is performed on each 1-D vertebra activation signal to localizeand identify each vertebra center in the one or more images.Specifically, the vertebra centers are localized and identified based onthe 1-D vertebra activation signals {Q_(v)} and {circumflex over (Q)}′by solving an optimization problem. Denoting N as the number of detectedvertebrae and v_(l) as the lowest index among them, since the detectedvertebrae must be consecutive, their indices can be represented by[v_(l), v_(l)+N−1]. The locations of the detected vertebrae are denotedas k={k_(i)}_(i∈[0,N−1]), where i is the vertebra's index relative tov_(l). Thus, k_(i) indicates the z-axis location of the vertebra withabsolute index v_(l)+i. Since N can be represented by k, N is droppedfrom the parameters for the sake of notation simplicity. The parameters(v_(l), k) are optimized to minimize the following energy function:

(v _(l) ,k)=−Σ_(i=0) ^(N−1)λ_(v) _(l) _(+i) Q _(v) _(l) _(+i)(k_(i))+Σ_(i=2) ^(N−2) R(k _(i) −k _(i−1) , k _(i+1) −k _(i)).   (4)

Q_(v) _(l) _(+i)(k_(i)) is the activation value of the vertebra with theabsolute index v=v_(l)+i. R(⋅,⋅) is a regularization term thatencourages the distances between neighboring vertebrae to be similar,and can be written as:

$\begin{matrix}{{R\left( {a,b} \right)} = {\exp\;{\left( {\max\ \left( {\frac{a}{b},\frac{b}{a}} \right)} \right).}}} & (5)\end{matrix}$

λ_(v) denotes weights of the 26 vertebrae. The two vertebrae at bothends of the spine (i.e., C1: cervical-1, C2: cervical-2, S1: the firstsacrum, and S2: the last sacrum) are designated as anchors and theirweights λ_(v) are set to 2. For all other vertebrae, the weights are setto 1. The vertebrae (C1, C2, S1, and S2) at both ends of the spine havemore distinct appearances, and hence are given more weights than others.

In one embodiment, the vertebra centers are jointly searched to maximizethe total vertebra activation score while keeping the distances betweenthe vertebra centers regulated. The search space of (v_(l), k)implicitly imposes a hard constraint that the detected vertebrae must beconsecutive with the indices from v_(l) to v_(l)+N−1.

In one embodiment, the optimization process performed at S140 includesan initialization step followed by iterative updates. The parameters(v_(l), k) are searched in the space: v_(l)∈[1,26], and k_(i)∈[0, L]. Lis a total length of the spine. At the initialization step, v_(l) is setto 1 and the vertebra centers k are set to the coordinates of localmaxima of {circumflex over (Q)} sequentially (i.e., k_(i+1)>k_(i)).After the initialization, three operations including a first operation,a second operation, and a third operation are iteratively applied tosearch the parameters.

The first operation is an offset operation, in which v_(l) is optimizedvia exhaustive search:

$\begin{matrix}\left. v_{l}\leftarrow{\arg{\min\limits_{v_{l}}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right. & (6)\end{matrix}$

The second operation is a fine-tune operation, in which {k_(v)} isoptimized via Hill Climbing optimization:

$\begin{matrix}\left. k\leftarrow{\arg{\min\limits_{k}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right. & (7)\end{matrix}$

The fine-tune operation adjusts the vertebra centers to minimize thetotal energy concerning both the individual activation signal Q_(v) andthe distance regularization.

The third operation is an expansion operation, in which a new vertebracenter is inserted to k between (u, u+1). Specifically, the expanded kis denoted as E(k, u):

$\begin{matrix}{\left. k\leftarrow{E\left( {k,u} \right)} \right. = \left\{ {\begin{matrix}{k_{i}\ } \\{\left( {k_{i} + k_{i + 1}} \right)/2} \\k_{i + 1}\end{matrix}\begin{matrix}\begin{matrix}{{{if}{\;\ }i} \leq u} \\{{{if}\mspace{9mu} i} = {u + 1.}}\end{matrix} \\{{{if}\mspace{14mu} i} > u}\end{matrix}} \right.} & (8)\end{matrix}$

The insertion location u is searched by minimizing the energy functionbelow:

$\begin{matrix}{u = {\arg{\min\limits_{u \in {\lbrack{0,{N - 2}}\rbrack}}{{\mathcal{L}\left( {v_{l},{E\left( {k,u} \right)}} \right)}.}}}} & (9)\end{matrix}$

The expansion operation addresses missed vertebrae that are not capturedby the local maxima of {circumflex over (Q)}.

The three operations are iteratively applied until the energy functionstarts to increase (i.e., indicating convergence), that is,

_(j)(v_(l), k)>

_(j−1)(v_(l), k), where

_(j−1)(v_(l), k) and

_(j)(v_(l), k) are the energy functions after the (j−1)-th and the j-thiteration of the offset operation, respectively.

Referring to FIG. 1, at S150, one or more images are outputted with eachvertebra center localized and each vertebra label identified.Specifically, the parameters (v_(l), k) associated with lowest

during the process are taken as the optimization output. After thevertebra centers are localized from the 1-D vertebra activation signals,the coordinates of the vertebra centers are mapped back to the one ormore images following a reverse spatial mapping of the spinerectification to produce final 3-D localization results. Outputting theone or more images may include displaying the one or more images on ascreen, printing the one or more images by a printer, transmitting theone or more images to a remote device via a communication network, orstoring the one or more images in a storage device for future viewing.

The pseudo code of the optimization process is shown in FIG. 4.

To evaluate the performance of the vertebra localization andidentification method consistent with the present disclosure, 302 CTimages with vertebra center annotations are obtained from the publiclyavailable benchmarks. The CT images are representative for the vertebralocalization and identification tasks due to various pathologies andimaging conditions that include severe scoliosis, vertebral fractures,metal implants, and small field-of-view (FOV). The 302 CT images aresplit into a training dataset of 242 CT images and a testing dataset of60 CT images.

Commonly used evaluation metrics for vertebra localization andidentification tasks include an identification rate and a localizationerror. The identification rate measures the percentage of vertebrae thatare correctly identified. A vertebra is considered as correctlyidentified if the detected vertebra center and the ground truth aremutually the closest and their distance is within 20 mm. Thelocalization error measures the mean and standard deviation oflocalization errors (in mm) of correctly identified vertebrae. Theevaluation metrics are calculated for the vertebrae overall, as well asseparately for different spine regions (i.e., cervical, thoracic, andlumbar regions.)

In one embodiment, the vertebra localization and identification methodis implemented in PyTorch and is executed by a computer workstation. Thekey point detection model is implemented using nnU-Net. The CT imagesare re-sampled to 0.3×0.3×1.25 mm spacing. In a training stage, the 3-Dpatches of size 128×160×64 voxels are cropped from each CT image asinput. In an inference stage, the trained key point detection model isapplied on non-overlapping patches of the same size to obtain thelocalized vertebra activation maps for the entire CT image. A stochasticgradient descent (SGD) optimizer with a learning rate of 0.01, a weightdecay of 3e−5, and a mini-batch size of 2 are used to train the keypoint detection model for 1,000 epochs.

The vertebra localization and identification method consistent with thepresent disclosure is applied on the test dataset of 60 CT images. Inthe cervical region, the mean error 2.40 mm, the standard deviation 1.18mm, and the identification rate 96.8% are provided. In the thoracicregion, the mean error 2.35 mm, the standard deviation 1.28 mm, and theidentification rate 97.8% are provided. In the lumbar region, the meanerror 3.19 mm, the standard deviation 1.69 mm, and the identificationrate 97.2% are provided. In all regions, the mean error 2.55 mm, thestandard deviation 1.40 mm, and the identification rate 97.4% areprovided.

Nine baseline methods are selected to compare with the disclosed method.The nine baseline methods include a classic method with hand-craftedfeature, techniques with data-driven anatomical prior, and methods withanatomy inspired architectures. While the disclosed method reports theidentification rate 97.4% and the mean error 2.55 mm, the closestcompeting method reports the identification rate 94.7% and the meanerror 2.56 mm. An identification error rate is significantly reducedapproximately from 5.3% to 2.6%, by absolute 2.7% (or relative 50.9%).When evaluated on three spine regions separately, the identificationrates of the disclosed method are still better than all comparisonmethods, except for the lumbar region when compared to one of the ninebaseline methods. On cervical and thoracic spines, the disclosed methodachieves the highest identification rates of 96.8% and 97.8%,respectively. Overall, the disclosed method substantially outperformsthe nine baseline methods.

The baseline method that outperforms other baseline methods in theidentification rate imposes the hard physical constraint by Markovmodeling, which ensures the output to be anatomically plausible. Despitethe performance gain, the baseline method has a noticeable tendency toachieve higher identification rate in the lumbar region but loweridentification rate in the cervical region. Because the baseline methodemploys Markov modeling to trace vertebrae from one end of the spine(i.e., lumbar) to the other end (i.e., cervical). The Markov modelingsuccessfully regulates the consecutive vertebra indices, which leads tosignificant performance gain compared to the other baseline methodswithout such regulation. However, the error accumulates along with thenumber of Markov steps as the process moves toward the cervical end. Incontrast, the disclosed method globally searches and identifies thevertebrae with the constraint of consecutive vertebra indices, whicheliminates the directional bias caused by Markov modeling and results inconsistent performance in all spine regions.

Further, effects and behavior of the spine rectification andanatomically constrained optimization are analyzed via an ablation studyof alternative methods. The most naive alternative to the iterativeoptimization is to take the maximum location of individual 3-Dactivation map G_(v) as the center of the v-th vertebra, denoted as basemodel. A slightly more sophisticated approach is to take the maximumlocation of individual 1-D activation signal Q_(v) as the center of thev-th vertebra. Since this approach employs the spine rectification, itis denoted as base+rectify model. In the above two approaches, sincethere is no constraint applied, physically implausible vertebra ordersthat violate the anatomy can be produced. A more advanced variation isto take the locations of local maximums of {circumflex over (Q)} ascandidates of consecutive vertebrae and to determine v_(l) following theequation (6). This approach ensures the consecutive order of thepredicted vertebrae on top of the spine rectification, and is thusreferred to as base+rectify+order model. This approach is equivalent tothe disclosed method that stops after the offset operation in the firstoptimization iteration. Since the disclosed method employs both thespine rectification and the anatomically constrained optimization, it isreferred to as base+rectify+optim model. The results of the ablationstudy are summarized in Table 1.

TABLE 1 Cervical Thoracic Lumbar All Mean Mean Mean Mean Model error StdId rate error Std Id rate error Std Id rate error Std Id rate base 2.241.25 96.8 2.53 1.53 76.0 2.67 1.66 75.2 2.46 1.48 81.9 base + rect 2.461.64 95.8 2.32 1.63 73.8 3.14 1.70 79.3 2.54 1.68 81.4 Base + rect +order + λ 2.55 1.83 94.2 2.31 1.54 89.4 3.19 1.71 91.0 2.57 1.70 91.2Base + rect + optim + λ 2.40 1.18 96.8 2.35 1.28 97.8 3.19 1.69 97.22.55 1.40 97.4 Base + rect + order 2.54 1.84 93.7 2.33 1.25 87.2 3.151.69 86.9 2.57 1.58 89.0 Base + rect + optim 2.40 1.18 96.3 2.38 1.2894.1 3.15 1.68 92.4 2.55 1.39 94.4

FIG. 5 illustrates comparison of vertebra localization andidentification results of five datasets (1-5) processed by four existingmethods (a-d) and the method (e) consistent with embodiments of thepresent disclosure. The dataset 1 includes CT images in the cervicalregion. The datasets 2-4 includes CT images in the thoracic region andthe lumbar region. The dataset 5 includes CT images with metal implants.The methods include: (a) base, (b) base+rect+order, (c) base+rect+order,(d) base+rect+optim, (e) base+rect+optim+λ (present disclosure). Theground-truth vertebra centers are marked by dots and labels on a leftside of the spine in the CT images. The vertebra centers detected by themethod consistent with the embodiments of the present disclosure aremarked by dots and labels on a right side of the spine in the CT images.A line is drawn between the ground-truth and the detected centers of thesame vertebra for better visualization of the localization error.Similarly, additional examples are shown in FIGS. 7-9.

The purpose of the spine rectification is to enable applying theanatomical constraints in the downstream processing. Thus, employing thespine rectification without imposing the anatomical constraints does notbring any performance gain, as shown by the comparison between basemodel and base+rect model. By imposing an effective/meaningfulconstraint of the vertebra order, base+rect+order model ensuresphysically plausible results and significantly improves theidentification rate over base+rect model from 81.4% to 91.2%. Byemploying the anatomically-constrained optimization, base+rect+optimmodel is able to regulate the distance between predicted vertebrae whilepreserving the physically plausible vertebra order. As a result, theidentification rate is further improved from 91.2% to 97.4%. It is alsoobserved that while the overall identification rate improvessignificantly, the identification rate in the cervical region isconsistently high using different methods. This is because the cervicalvertebrae have a more distinct appearance and can be reliablyrecognized.

The vertebra weights λ also plays an important role by encouraging theoptimization to focus more on the vertebrae that can be reliablydetected by the key point detection model. To analyze the contributionof the vertebra weights, an experiment is conducted to compare theperformances of base+rect+order model and the disclosed method with andwithout using vertebra weights λ. As summarized in Table 1, employingvertebra weights λ leads to improved performance on both base+rect+ordermodel and the disclosed method. In particular, the overallidentification rate is improved from 89.0% and 94.4% to 91.2% and 97.4%on the two methods, respectively. The mean error is not affected much byemploying the vertebra weights λ, which suggests that the vertebraweights λ have little effect on the accuracy of correctly identifiedvertebrae.

FIG. 6 illustrates comparison of identification rates for each of 26vertebrae of various alternative methods consistent with embodiments ofthe present disclosure. As shown in FIG. 6, the disclosed methodoutperforms the three alternative models for each and every vertebra.

FIG. 10 illustrates incorrectly localized and incorrectly identifiedvertebrae of three datasets processed by four existing methods (a-d) andthe method (e) consistent with embodiments of the present disclosure. Asshown in FIG. 10, three failure cases (1-3) are illustrated. Extremepathology and/or low image quality may degrade the performance. Inparticular, the first failure case has severe vertebral compressionfactures, which significantly reduces the height of the vertebrae aswell as the space margins between them. The second failure case has lowimaging quality, making it difficult to differentiate the boundarybetween vertebrae. Consequently, missed detection and false positiveresults are observed in the above two cases, respectively. In the thirdcase, the vertebra centers are correctly located but labels are off byone. The underlying cause of the third failure case is the lack ofdistinct vertebrae that can be reliably recognized. In particular, themore distinct L5 and sacrum vertebrae are not in the field of view. Theimaging appearance of T12 vertebra (i.e., the lowest vertebra with rib)is affected by the metal implant. The analysis of the failure casesshows that severe pathologies and extreme imaging conditions may stillnegatively impact the model's performance on robustness.

The vertebra localization and identification method provided by theembodiments of the present disclosure is highly robust and accurate. Thethorough evaluations based on the publicly available benchmarksdemonstrate that by rectifying the spine (via converting and effectivelysimplifying 3-D vertebra activation maps into 1-D vertebra activationsignals) and jointly localizing all vertebrae following the anatomicalconstraint, the disclosed method outperforms the existing methods bysignificantly large quantitative margins. Further, the ablation studyvalidates the effectiveness of each algorithm component.

FIG. 11 illustrates a structural diagram of an exemplary device forvertebra localization and identification consistent with embodiments ofthe present disclosure. Referring to FIG. 11, the present disclosureprovides a vertebra localization and identification device. The vertebralocalization and identification device performs the disclosed vertebralocalization and identification method. Specifically, the vertebralocalization and identification device includes a memory 1101 storing acomputer program and a processor 1102 configured to execute the computerprogram stored in the memory 1101 to perform: receiving one or moreimages of vertebrae of a spine, applying a machine learning model on theone or more images to generate 3-D vertebra activation maps {G_(v)} ofdetected vertebra centers, performing a spine rectification process onthe 3-D vertebra activation maps {G_(v)} to convert each 3-D vertebraactivation map G, into a corresponding one-dimensional (1-D) vertebraactivation signal Q_(v); performing an anatomically-constrainedoptimization process on each 1-D vertebra activation signal Q_(v) tolocalize and identify each vertebra center in the one or more images;and outputting the one or more images, where on each of the one or moreoutputted images, a location k and an identification v_(l) of eachvertebra center are specified, wherein v_(l)∈[1, 26].

In some embodiments, the machine learning model is a key point detectionmodel that is trained using U-Net as a backbone network to produce the3-D activation maps {G_(v)}, where G_(v)∈

^(W×H×L), and v∈{1, 2, . . . , 26}, W×H×L is a size of the one or moreimages, and

is a mathematical expression of real numbers.

In some embodiments, when performing the spine rectification process onthe 3-D vertebra activation maps {G_(v)} to convert each 3-D vertebraactivation map G_(v) into the corresponding 1-D vertebra activationsignal Q_(v), the processor 1102 is configured to combine the 3-Dvertebra activation maps {G_(v)} into one combined activation map Ĝ,where Ĝ=Σ_(v=1) ²⁶G_(v); extract a spine centerline s(t) based on thecombined activation map Ĝ, where s(t)=(x(t), y(t), z(t)), x(t), y(t),z(t) are x, y, z coordinates of the spine centerline s(t), and t isarc-length parameterization including a distance from a starting pointof the spine centerline s(t); and aggregate each 3-D vertebra activationmap G_(v) along the spine centerline s(t) to produce the 1-D vertebraactivation signal Q_(v), where Q_(v)(z)=Σ_(x,y)G′_(v)(x, y, z).

In some embodiments, when aggregating each 3-D vertebra activation mapG_(v) along the spine centerline s(t) to produce the 1-D vertebraactivation signal Q_(v), the processor 1102 is configured to calculate amoving local coordinate system

e₁(t), e₂(t), e₃(t)

along the spine centerline s(t), where e₃(t) is a tangent vector of thespine centerline s(t), e₂(t) is a unit vector in a normal plane of thespine centerline s(t) with a minimum angle toy-axis of the one or moreimages, and e₁(t) is a cross product of e₂(t) and e₃(t), warp the 3-Dvertebra activation maps {G_(v)} and the combined activation map Ĝ toproduce spine rectified 3-D vertebra activation maps {G_(v′)} and spinerectified combined activation map Ĝ′, where G′_(v)(x, y,z)=G_(v)(s(z)+e₁(x)+e₂(y)), and G_(v)(s(z)+e₁(x)+e₂(y)) denotes linearinterpolation of G_(v) at given (x, y, z) coordinate, x-axis aligns withan anterior direction of each vertebra, y-axis aligns with a rightdirection of each vertebra, and z-axis aligns with the spine centerline;and process the spine rectified activation maps {G′_(v)} and the spinerectified combined activation map Ĝ′ to produce the 1-D vertebraactivation signals {Q_(v)} and {circumflex over (Q)}, respectively,where Q_(v)(z)=Σ_(x,y)G′_(v)(x, y, z).

In some embodiments, when performing the anatomically-constrainedoptimization process on each 1-D vertebra activation signal Q_(v) tolocalize and identify each vertebra center in the one or more images,the processor 1102 is configured to determine a parameter (v_(l), k) foreach 1-D vertebra activation signal Q_(v) by minimizing an energyfunction

(v_(l), k), wherein

(v_(l), k)=−Σ_(i=0) ^(N−1)λ_(v) _(l) _(i)Q_(v) _(l) _(+i)(k_(i))+Σ_(i=2)^(N−2)R(k_(i)−k_(i−1), k_(i+1)−k_(i)), v_(l) is a lowest index of an Nnumber of detected vertebrae, v_(l)∈[1,26], k denotes vertebra centersand k={k_(i)}_(i∈[0,N−1]), i is the vertebra's index relative to v_(l),k_(i) is a location of the vertebra with absolute index v_(l)+i, Q_(v)_(l) _(+i)(k_(i)) is an activation value of the vertebra with theabsolute index v_(l)+i, R(k_(i)−k_(i−1), k_(i+1)−k_(i)) is aregularization term that encourages distances between neighboringvertebrae to be similar and

${{R\left( {{k_{i} - k_{i - 1}},{k_{i + 1} - k_{i}}} \right)} = {\exp\left( {\max\ \left( {\frac{k_{i} - k_{i - 1}}{k_{i + 1} - k_{i}},\frac{k_{i + 1} - k_{i}}{k_{i} - k_{i - 1}}} \right)} \right)}},$

and λ_(v) is weights of the vertebrae and

${\lambda_{v} = \left\{ \begin{matrix}{1,} \\{2,}\end{matrix} \right.}{\begin{matrix}{\ {v \in \left\{ {3,4,\ldots\;,\ 24} \right\}}} \\{\ {v \in \left\{ {1,2,{25},{26}} \right\}}}\end{matrix}.}$

In some embodiments, when determining the parameter (v_(l), k) for each1-D vertebra activation signal Q_(v) by minimizing the energy function

(v_(l), k), the processor 1102 is configured to initialize the parameter(v_(l), k); perform an offset operation to determine v_(l); perform afine-tune operation to determine k; perform an expansion operation todetermine a location of a missing vertebra center; and iterativelyrepeat the offset operation, the fine-tune operation, and the expansionoperation until convergence occurs.

In some embodiments, when initializing the parameter (v_(l), k), theprocessor 1102 is configured to initialize v_(l) to 1 and k tocoordinates of local maxima of {circumflex over (Q)} sequentially,wherein k_(i+1)>k_(i), k_(i)∈[0, L], and L is a total length of thespine.

In some embodiments, when performing the offset operation to determinev_(l), the processor 1102 is configured to perform the offset operationto optimize v_(l) via an exhaustive search

$\left. v_{l}\leftarrow{\arg{\min\limits_{v_{l}}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right.$

In some embodiments, when performing the fine-tune operation todetermine k, the processor 1102 is configured to perform the fine-tuneoperation to optimize {k_(v)} via a Hill Climbing optimization process

$\left. k\leftarrow{\arg{\min\limits_{k}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right.$

In some embodiments, when performing the expansion operation todetermine the location of the potential missing vertebra center, theprocessor 1102 is configured to perform the expansion operation byinserting a new vertebra center at an insertion location u to k between

$\left. {\left( {u,{u + 1}} \right)\mspace{14mu}{via}\mspace{14mu}{calculating}\mspace{14mu} k}\leftarrow{E\left( {k,u} \right)} \right. = \left\{ {\begin{matrix}{k_{i}\ } \\{\left( {k_{i} + k_{i + 1}} \right)/2} \\{k_{i + 1}\ }\end{matrix}\begin{matrix}\begin{matrix}{{{if}\mspace{9mu} i} \leq u} \\{{{if}\mspace{9mu} i} = {u + 1.}}\end{matrix} \\{{{if}\mspace{9mu} i} > u}\end{matrix}} \right.$

In some embodiments, the insertion location u is searched by minimizingan energy function

(v_(l), E(k, u)), wherein

$u = {\arg{\min\limits_{u \in {\lbrack{0,{N - 2}}\rbrack}}{{\mathcal{L}\left( {v_{l},{E\left( {k,u} \right)}} \right)}.}}}$

In some embodiments, the identification of each vertebra center on theone or more outputted images is specified by labelling seven cervicalvertebrae as C1-C7; twelve thoracic vertebrae as T1-T12; five lumbarvertebrae as L1-L5; and two sacrum vertebrae as S1-S2.

The memory 1101 may include volatile memory such as random-access memory(RAM), and non-volatile memory such as flash memory, hard disk drive(HDD), or solid-state drive (SSD). The memory 1101 may also includecombinations of various above-described memories. The processor 1102 mayinclude a central processing unit (CPU), an embedded processor, amicrocontroller, and a programmable device such as anapplication-specific integrated circuit (ASIC), a field programmablegate array (FPGA), and a programmable logic array (PLD), etc.

The vertebra localization and identification device provided by theembodiments of the present disclosure is highly robust and accurate. Thethorough evaluations based on the publicly available benchmarksdemonstrate that by rectifying the spine (via converting and effectivelysimplifying 3-D detection activation maps into 1-D detection signals)and jointly localizing all vertebrae following the anatomicalconstraint, the disclosed device outperforms the existing methods bysignificantly large quantitative margins.

The present disclosure also provides a computer-readable storage mediumstoring a computer program. The computer program may be loaded to acomputer or a processor of a programmable data processing device, suchthat the computer program is executed by the computer or the processorof the programmable data processing device to implement the disclosedmethod.

Although the principles and implementations of the present disclosureare described by using specific embodiments in the specification, theforegoing descriptions of the embodiments are only intended to helpunderstand the method and core idea of the method of the presentdisclosure. Meanwhile, a person of ordinary skill in the art may makemodifications to the specific implementations and application rangeaccording to the idea of the present disclosure. In conclusion, thecontent of the specification should not be construed as a limitation tothe present disclosure.

What is claimed is:
 1. A vertebra localization and identificationmethod, comprising: receiving one or more images of vertebrae of aspine; applying a machine learning model on the one or more images togenerate three-dimensional (3-D) vertebra activation maps {G_(v)} ofdetected vertebra centers; performing a spine rectification process onthe 3-D vertebra activation maps {G_(v)} to convert each 3-D vertebraactivation map G_(v) into a corresponding one-dimensional (1-D) vertebraactivation signal Q_(v); performing an anatomically-constrainedoptimization process on each 1-D vertebra activation signal Q_(v) tolocalize and identify each vertebra center in the one or more images;and outputting the one or more images, wherein on each of the one ormore outputted images, a location k and an identification v_(l) of eachvertebra center are specified, wherein v_(l)∈[1, 26].
 2. The methodaccording to claim 1, wherein: the machine learning model is a key pointdetection model that is trained using U-Net as a backbone network toproduce the 3-D vertebra activation maps {G_(v)}, wherein G_(v)∈

^(W×H×L), v∈{1, 2, . . . , 26}, W×H×L is a size of the one or moreimages, and

is a mathematical expression of real numbers.
 3. The method according toclaim 1, wherein performing the spine rectification process on the 3-Dvertebra activation maps {G_(v)} to convert each 3-D vertebra activationmap G_(v) into the corresponding 1-D vertebra activation signal Q_(v)includes: combining the 3-D vertebra activation maps {G_(v)} into onecombined activation map Ĝ, wherein Ĝ=Σ_(v=1) ²⁶G_(v); extracting a spinecenterline s(t) based on the combined activation map Ĝ, whereins(t)=(x(t), y(t), z(t)), x(t), y(t), z(t) are x, y, z coordinates of thespine centerline s(t), and t is arc-length parameterization including adistance from a starting point of the spine centerline s(t); andaggregating each 3-D vertebra activation map G_(v) along the spinecenterline s(t) to produce the 1-D vertebra activation signal Q_(v),wherein Q_(v)(z)=Σ_(x,y)G′_(v)(x, y, z).
 4. The method according toclaim 3, wherein aggregating each 3-D vertebra activation map G_(v)along the spine centerline s(t) to produce the 1-D vertebra activationsignal Q_(v) includes: calculating a moving local coordinate system

e₁(t), e₂(t), e₃(t)

along the spine centerline s(t), wherein e₃(t) is a tangent vector ofthe spine centerline s(t), e₂(t) is a unit vector in a normal plane ofthe spine centerline s(t) with a minimum angle toy-axis of the one ormore images, and e₁(t) is a cross product of e₂(t) and e₃(t); warpingthe 3-D vertebra activation maps {G_(v)} and the combined activation mapĜ to produce spine rectified 3-D vertebra activation maps {G′_(v)} andspine rectified combined activation map Ĝ′, wherein G′_(v)(x, y,z)=G_(v)(z)+e₁(x)+e₂(y)), G_(v)(s(z)+e₁(x)+e₂(y)) denotes linearinterpolation of G_(v) at given (x, y, z) coordinate, x-axis aligns withan anterior direction of each vertebra, y-axis aligns with a rightdirection of each vertebra, and z-axis aligns with the spine centerline;and processing the spine rectified activation maps {G′_(v)} and thespine rectified combined activation map Ĝ′ to produce the 1-D vertebraactivation signals {Q_(v)} and {circumflex over (Q)}, respectively,wherein Q_(v)(z)=Σ_(x, y)G′_(v)(x, y, z).
 5. The method according toclaim 1, wherein performing the anatomically-constrained optimizationprocess on each 1-D vertebra activation signal Q_(v) to localize andidentify each vertebra center in the one or more images includes:determining a parameter (v_(l), k) for each 1-D vertebra activationsignal Q_(v) by minimizing an energy function

(v_(l), k), wherein

(v_(l), k)=−Σ_(i=0) ^(N−1)λ_(v) _(l) _(+i)Q_(v) _(l)_(+i)(k_(i))+Σ_(i=2) ^(N−2)R(k_(i)−k_(i−1), k_(i+1)−k_(i)), v_(l) is alowest index of an N number of detected vertebrae, v_(l)∈[1,26], kdenotes vertebra centers and k={k_(i)}_(i∈[0,N−1]), i is the vertebra'sindex relative to v_(l), k_(i) is a location of the vertebra withabsolute index v_(l)+i, Q_(v) _(l+i) (k_(i)) is an activation value ofthe vertebra with the absolute index v_(l)+i, R(k_(i)−k_(i−1),k_(i+1)−k_(i)) is a regularization term that encourages distancesbetween neighboring vertebrae to be similar and${{R\left( {{k_{i} - k_{i - 1}},{k_{i + 1} - k_{i}}} \right)} = {\exp\;\left( {\max\left( {\frac{k_{i} - k_{i - 1}}{k_{i + 1} - k_{i}},\frac{k_{i + 1} - k_{i}}{k_{i} - k_{i - 1}}} \right)} \right)}},$and λ_(v) is weights of the vertebrae and${\lambda_{v} = \left\{ \begin{matrix}{1,} \\{2,}\end{matrix} \right.}{\begin{matrix}{\ {v \in \left\{ {3,4,\ldots\;,\ 24} \right\}}} \\{\ {v \in \left\{ {1,2,{25},{26}} \right\}}}\end{matrix}.}$
 6. The method according to claim 5, wherein determiningthe parameter (v_(l), k) for each 1-D vertebra activation signal Q_(v)by minimizing the energy function

(v_(l), k) includes: initializing the parameter (v_(l), k); performingan offset operation to determine v_(l); performing a fine-tune operationto determine k; performing an expansion operation to determine alocation of a missing vertebra center; and iteratively repeating theoffset operation, the fine-tune operation, and the expansion operationuntil convergence occurs.
 7. The method according to claim 6, whereininitializing the parameter (v_(l), k) includes: initializing v_(l) to 1and k to coordinates of local maxima of {circumflex over (Q)}sequentially, wherein k_(i+1)>k_(i), k_(i)∈[0, L], and L is a totallength of the spine.
 8. The method according to claim 6, whereinperforming the offset operation to determine v_(l) includes: performingthe offset operation to optimize v_(l) via an exhaustive search$\left. v_{l}\leftarrow{\arg{\min\limits_{v_{l}}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right.$9. The method according to claim 6, wherein performing the fine-tuneoperation to determine k includes: performing the fine-tune operation tooptimize {k_(v)} via a Hill Climbing optimization process$\left. k\leftarrow{\arg{\min\limits_{k}{{\mathcal{L}\left( {v_{l},k} \right)}.}}} \right.$10. The method according to claim 6, wherein performing the expansionoperation to determine the location of the potential missing vertebracenter includes: performing the expansion operation by inserting a newvertebra center at an insertion location u to k between (u, u+1) viacalculating$\left. k\leftarrow{E\left( {k,u} \right)} \right. = \left\{ {\begin{matrix}{k_{i}\ } \\{\left( {k_{i} + k_{i + 1}} \right)/2} \\{k_{i + 1}\ }\end{matrix}{\begin{matrix}\begin{matrix}{{{if}\mspace{9mu} i} \leq u} \\{{{if}\mspace{9mu} i} = {u + 1}}\end{matrix} \\{{{if}\mspace{9mu} i} > u}\end{matrix}.}} \right.$
 11. The method according to claim 10, wherein:the insertion location u is searched by minimizing an energy function

(v_(l), E(k, u)), wherein$u = {\arg{\min\limits_{u \in {\lbrack{0,{N - 2}}\rbrack}}{{\mathcal{L}\left( {v_{l},{E\left( {k,u} \right)}} \right)}.}}}$12. The method according to claim 1, wherein the identification of eachvertebra center on the one or more outputted images is specified bylabelling seven cervical vertebrae as C1-C7; twelve thoracic vertebraeas T1-T12; five lumbar vertebrae as L1-L5; and two sacrum vertebrae asS1-S2.
 13. A vertebra localization and identification device,comprising: a memory storing a computer program; and a processorconfigured to execute the computer program stored in the memory to:receive one or more images of vertebrae of a spine; apply a machinelearning model on the one or more images to generate three-dimensional(3-D) vertebra activation maps {G_(v)} of detected vertebra centers;perform a spine rectification process on the 3-D vertebra activationmaps {G_(v)} to convert each 3-D vertebra activation map G_(v) into acorresponding one-dimensional (1-D) vertebra activation signal Q_(v);perform an anatomically-constrained optimization process on each 1-Dvertebra activation signal Q_(v) to localize and identify each vertebracenter in the one or more images; and output the one or more images,wherein on each of the one or more outputted images, a location k and anidentification v_(l) of each vertebra center are specified, whereinv_(l)∈[1, 26].
 14. The device according to claim 13, wherein: themachine learning model is a key point localization model that is trainedusing U-Net as a backbone network to produce the 3-D vertebra activationmaps {G_(v)}, wherein G_(v)∈

^(W×H×L),v∈{1, 2, . . . , 26}, W×H×L is a size of the one or moreimages, and

is a mathematical expression of real numbers.
 15. The device accordingto claim 13, wherein when performing the spine rectification process onthe 3-D vertebra activation maps {G_(v)} to convert each 3-D vertebraactivation map G_(v) into the corresponding 1-D vertebra activationsignal Q_(v), the processor is configured to: combine the 3-D vertebraactivation maps {G_(v)} into one combined activation map Ĝ, whereinĜ=Σ_(v=1) ²⁶G_(v); extract a spine centerline s(t) based on the combinedactivation map Ĝ, wherein s(t)=(x(t), y(t), z(t)), x(t), y(t), z(t) arex, y, z coordinates of the spine centerline s(t), and t is arc-lengthparameterization including a distance from a starting point of the spinecenterline s(t); and aggregate each 3-D vertebra activation map G_(v)along the spine centerline s(t) to produce the 1-D vertebra activationsignal Q_(v), wherein Q_(v)(z)=Σ_(x,y), G′_(v)(x, y, z).
 16. The deviceaccording to claim 15, wherein when aggregating each 3-D vertebraactivation map G_(v) along the spine centerline s(t) to produce the 1-Dvertebra activation signal Q_(v), the processor is configured to:calculate a moving local coordinate system (e₁(t), e₂(t), e₃(t)) alongthe spine centerline s(t), wherein e₃(t) is a tangent vector of thespine centerline s(t), e₂(t) is a unit vector in a normal plane of thespine centerline s(t) with a minimum angle toy-axis of the one or moreimages, and e₁(t) is a cross product of e₂(t) and e₃(t); warp the 3-Dvertebra activation maps {G_(v)} and the combined activation map Ĝ toproduce spine rectified 3-D vertebra activation maps {G′_(v)} and spinerectified combined activation map Ĝ′, wherein Ĝ′_(v)(x, y,z)=G_(v)(s(z)+e₁(x)+e₂(y)), G_(v)(s(z)+e₁(x)+e₂(y)) denotes linearinterpolation of G_(v) at given (x, y, z) coordinate, x-axis aligns withan anterior direction of each vertebra, y-axis aligns with a rightdirection of each vertebra, and z-axis aligns with the spine centerline;and process the spine rectified activation maps {G′_(v)} and the spinerectified combined activation map Ĝ′ to produce the 1-D vertebraactivation signals {Q_(v)} and {circumflex over (Q)}, respectively,wherein Q_(v)(z)=Σ_(x,y)G′_(v)(x, y, z).
 17. The device according toclaim 13, wherein when performing the anatomically-constrainedoptimization process on each 1-D vertebra activation signal Q_(v) tolocalize and identify each vertebra center in the one or more images,the processor is configured to: determine a parameter (v_(l), k) foreach 1-D vertebra activation signal Q_(v) by minimizing an energyfunction

(v_(l), k), wherein

(v_(l),k)=−_(i=0) ^(N−1)λ_(v) _(l) _(+i)(k_(i))+Σ_(i=2)^(N−2)R(k_(i)−k_(i−1), k_(i+1)−k_(i)), v_(l) is a lowest index of an Nnumber of detected vertebrae, v_(l)∈[1,26], k denotes vertebra centersand k={k_(i)}_(i∈[0,N−1]), i is the vertebra's index relative to v_(l),k_(i) is a location of the vertebra with absolute index v_(l)+i, Q_(v)_(l) _(+i)(k_(i)) is an activation value of the vertebra with theabsolute index v_(l)+i, R(k_(i)−k_(i−1), k_(i+1)−k_(i)) is aregularization term that encourages distances between neighboringvertebrae to be similar and${{R\left( {{k_{i} - k_{i - 1}},{k_{i + 1} - k_{i}}} \right)} = {\exp\;\left( {\max\left( {\frac{k_{i} - k_{i - 1}}{k_{i + 1} - k_{i}},\frac{k_{i + 1} - k_{i}}{k_{i} - k_{i - 1}}} \right)} \right)}},$and λ_(v) is weights of the vertebrae and${\lambda_{v} = \left\{ \begin{matrix}{1,} \\{2,}\end{matrix} \right.}{\begin{matrix}{\ {v \in \left\{ {3,4,\ldots\;,\ 24} \right\}}} \\{\ {v \in \left\{ {1,2,{25},{26}} \right\}}}\end{matrix}.}$
 18. The device according to claim 17, wherein whendetermining the parameter (v_(l), k) for each 1-D vertebra activationsignal Q_(v) by minimizing the energy function

(v_(l), k), the processor is configured to: initialize the parameter(v_(l), k); perform an offset operation to determine v_(l); perform afine-tune operation to determine k; perform an expansion operation todetermine a location of a missing vertebra center; and iterativelyrepeat the offset operation, the fine-tune operation, and the expansionoperation until convergence occurs.
 19. A non-transitorycomputer-readable storage medium storing a computer program, when beingexecuted by a processor, the computer program performing: receiving oneor more images of vertebrae of a spine; applying a machine learningmodel on the one or more images to generate three-dimensional (3-D)vertebra activation maps {G_(v)} of detected vertebra centers;performing a spine rectification process on the 3-D vertebra activationmaps {G_(v)} to convert each 3-D vertebra activation map G_(v) into acorresponding one-dimensional (1-D) vertebra activation signal Q_(v);performing an anatomically-constrained optimization process on each 1-Dvertebra activation signal Q_(v) to localize and identify each vertebracenter in the one or more images; and outputting the one or more images,wherein on each of the one or more outputted images, a location k and anidentification v_(l) of each vertebra center are specified, whereinv_(l)∈[1, 26].
 20. The non-transitory computer-readable storage mediumaccording to claim 19, wherein: the machine learning model is a keypoint detection model that is trained using U-Net as a backbone networkto produce the 3-D vertebra activation maps {G_(v)}, wherein G_(v)∈

^(W×H×L), v∈{1, 2, . . . , 26}, W×H×L is a size of the one or moreimages, and

is a mathematical expression of real numbers.